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Spectral Compression Algorithms for the Analysis of 
Very Large Multivariate Images 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of U.S. Provisional Application No. 
5 60/472,447, filed May 20, 2003, which is incorporated herein by reference. 
This application is related to SD7213, "Spatial Compression Algorithms for 
the Analysis of Very Large Multivariate Images," filed of even date with this 
application. 

STATEMENT OF GOVERNMENT INTEREST 

10 This invention was made with Government support under contract no. 

DE-AC04-94AL85000 awarded by the U. S. Department of Energy to Sandia 
Corporation. The Government has certain rights in the invention. 

FIELD OF THE INVENTION 

The present invention relates to multivariate image analysis and, in 
15 particular, to methods for compressing data and using block algorithms to enable 
the analysis of very large multivariate images. 

BACKGROUND OF THE INVENTION 

Comprehensive chemical characterization of a complex microstructure is a 
daunting task. Recently, full spectrum imaging instruments have become 

20 available that can collect a complete spectrum at each point in a spatial array 
and which promise to provide the data needed to perform such a 
characterization. A remaining hurdle is the computational difficulty of reducing 
the very large quantities of raw spectral data to meaningful chemical information. 
Analysis of full spectrum images having sizes that can be collected with 

25 current instrumentation can outstrip the computational resources typically found 
in the laboratory. For example, a typical commercial energy dispersive x-ray 
imaging system can produce a 1024 x 1024 pixel (one megapixel) image with 
each pixel being represented by a complete 1024 channel x-ray spectrum. Four 
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Gbytes of computer memory are required just to hold a single precision floating 
point representation of this data set. To put this number in context, 4 Gbytes 
equals the full physical address space of a modern 32-bit microprocessor. This 
data overload precludes the use of standard analysis algorithms, which require 
5 that all of the data to be in the computer's physical memory at one time. 

A related issue is the computational complexity of the analysis algorithms. 
While the time required to fully read a given data set obviously scales linearly 
with the size of the data set, the analysis algorithms may scale up more quickly, 
leading to unacceptable computation times. To take one example, given a m x p 

10 matrix with m > p, Singular Value Decomposition (SVD), a popular method for 
effecting Principal Components Analysis (PCA), requires computing time 
proportional to m x p 2 . 

Researchers have taken a number of different approaches in attempting to 
overcome the foregoing computational problems. Current approaches to 

15 analyzing very large spectrum images range from downsampling to simply 
omitting data to various data compression schemes. In the context of 
spectroscopic data, the general idea behind data compression is to represent a 
high dimensional spectrum in terms of a lower dimension basis. Convenient 
bases include B-splines and wavelets. A principal component basis derived from 

20 PCA can yield good compression, provided the PCA can be computed. While 
these approaches are effective at reducing the size of the problem to be solved, 
they suffer the potential drawback that they work with an approximation to the 
data rather than the data itself. Whether or not the approximation is acceptable 
depends on the details of the data and the problem at hand. See, e.g., J. 

25 Andrew and T. Hancewicz, "Rapid Analysis of Raman Image Data Using Two- 
Way Multivariate Curve Resolution, Applied Spectroscopy 52 , 797 (1998); B. 
Alsberg and O. Kvalheim, "Speed improvement of multivariate algorithms by the 
method of postponed basis matrix multiplication Part I. Principal component 
analysis", Chemometrics Intell, Lab. Svst . 24, 31 (1994); H. Kiers and R. 

30 Harshman, "Relating two proposed methods for speedup of algorithms for fitting 
two- and three-way principal component and related multilinear models", 
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Chemometrics Intell. Lab. Svst . 36, 31 (1997); F. Vogt and M. Tacke, "Fast 
principal component analysis of large data sets", Chemometrics Intell. Lab Svst . 
59, 1 (2001); and F. Vogt and M. Tacke, "Fast principal component analysis of 
large data sets based on information extraction," J. Chemometrics 16 , 562 
5 (2002). 

The current invention overcomes these limitations by combining 
compression strategies with image analysis algorithms that operate directly on 
the compressed data. Spectral compression is described herein using an 
exemplary PCA-factored representation of the data. Furthermore, a block 

10 algorithm can be used for performing common operations more efficiently. A key 
advantage of the block algorithm is that at no point is it required to have all of the 
data residing simultaneously in the computer's main memory, enabling the 
operations to be performed on larger-than-memory data sets. For example, the 
block algorithm can be used that is suitable for out-of-core kernel PCA and a 

15 Multivariate Curve Resolution - Alternating Least Squares (MCR-ALS) procedure 
that directly employs a PCA representation of the data. The crossproduct matrix 
that is formed during the PCA compression using the block algorithm is 
mathematically identical to the one that would be computed if the entire data set 
could have been held in memory. Consequently, data sets that are larger than 

20 memory are readily analyzed. The performance of the algorithm is characterized 
for data sets ranging in size to upward of one billion individual data elements. 
For sufficiently large data sets, an out-of-core implementation of the algorithm 
extracts only a minor performance penalty over a fully in-core solution. 

Multiresolution spatial processing, or spatial compression, can be used 

25 separately or in combination with the spectral compression algorithm and image 
analysis techniques to provide speed improvements without the loss of spatial or 
spectral resolution. Spatial compression is described herein using an exemplary 
Haar wavelet transform. Multiresolution techniques have been used in traditional 
image processing and, in fact, serve as the basis of the recently introduced 

30 JPEG2000 image compression standard. Similar ideas can be applied to 

spatially compress high-spectral-dimension images. The basic idea is to reduce 
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the number of pixels analyzed to a smaller number of coefficients that capture all 
of the spatial information of interest. Using orthogonal wavelets to perform this 
compression allows standard multivariate curve resolution techniques to be 
applied to the coefficients themselves rather than the raw data. This can lead to 
5 a tremendous reduction in both the computational resources and time required to 
complete the analysis. In addition to improved computational performance, the 
multiresolution approach can also yield improved sensitivity to minor constituents 
for data having low signal to noise ratios. This is particularly true if the 
multiresolution spatial filters are matched to the features of interest in the sample. 

10 SUMMARY OF THE INVENTION 

The method of the present invention is directed to spectral compression 
algorithms, using a factored representation of the data, for the analysis of very 
large multivariate images. Using a block algorithm, a block of data is read from a 
disk. The block is suitably sized to fit in core memory and consists of either the 

15 full spectral data at some number of pixels or full image planes for some number 
of spectral channels. The block of data can be spatially compressed. For the 
typical case that the number of pixels m is greater than the number of spectral 
channels p, the pixel-space of the data block can be weighted, if desired, and 
spectral information can be accumulated for subsequent weighting of the spectral 

20 space. The crossproduct of the pixel-weighted data is computed and summed 
into a crossproduct matrix. After all of the blocks of data have been read in, 
spectral weighting of the crossproduct matrix is performed, if required. The 
eigenvalues and eigenvectors of the crossproduct matrix are computed and 
sorted. Based on the number of factors desired in the model, a ranked loadings 

25 matrix can be computed. A ranked scores matrix can be computed blockwise by 
re-reading the data blocks and post multiplying them by the ranked loadings. 

The spectral compression algorithms can be combined with a spatial 
compression algorithm to provide further computational efficiencies. If spatial 
compression is used and the data is compressed sufficiently to be stored 

30 completely in memory, the spatially compressed data can be directly multiplied 
by the loadings to compute spatially compressed scores. At the same time, if 
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spatial compression is used and a full resolution result is desired, the full spatial 
resolution scores matrix can be computed and stored. 

Image analysis, for example MCR-ALS, is performed using the 
compressed, PCA-factored representation of the data. If the data is not spatially 
5 compressed and the full resolution result is not desired, the calculation is 
complete. If it is spatially compressed, then the full resolution scores and 
loadings can be used to project the PCA-compressed data onto the pure spectral 
components to reconstruct the component maps with full spatial resolution. 

Alternatively, if p > m, the roles and interpretations of the factors can be 
10 reversed and a ranked scores matrix can be constructed from the eigenanalysis 
and a ranked loadings matrix can computed from the ranked scores matrix and 
the re-read data blocks. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The accompanying drawings, which are incorporated in and form part of 
15 the specification, illustrate the present invention and, together with the 

description, describe the invention. In the drawings, like elements are referred to 
by like numbers. 

FIG. 1 shows the memory hierarchy for a conventional personal computer. 

FIG. 2 shows a memory arrangement for an m x n matrix A stored in 
20 column-major order. 

FIG. 3 shows a MATLAB® code to demonstrate the performance 
difference between row-oriented and column-oriented algorithms for arrays 
stored in column-major order. 

FIG. 4 shows a block algorithm for the spectral compression of a full 
25 spectrum image using PCA of the raw spectral data. 

FIG. 5 shows a block algorithm for the spectral compression using PCA of 
a factored representation of the data. 

FIG. 6 shows a method for the image analysis of a PCA-factored 
representation of the data using MCR-ALS. 



6 



SD7276 
Keenan 

FIG. 7 shows the first six loading vectors and corresponding score maps 
from standard PCA of the energy dispersive x-ray (EDX) image of a series of 
wires of varying composition embedded in an epoxy block. The images are 128 
x 128 pixels in size. 

5 FIG. 8 shows the time required to compute a truncated PCA model for the 

wires image as a function of the number of principal components to be retained 
in the model. 

FIG. 9A shows the total spectral intensity image of the cross-sectioned 
wires embedded in the epoxy block. FIG. 9B shows a one-level Haar wavelet 
10 transformation of the image into four subimages. 

FIG. 10 shows a conceptual representation of a Haar wavelet 
transformation applied to an m-row x n-column x p-channel data cube. 

FIG. 1 1 illustrates conceptually the spatial compression of either raw or a 
spectrally compressed representation of the data. 
15 FIG. 12 illustrates conceptually two methods for obtaining the component 

maps at full resolution from the image analysis of the compressed data. 

FIGs. 13A-13D show a MCR-ALS analysis of a spatially compressed EDX 
image of a complex geological material. FIG. 13A shows a scanning electron 
micrograph of the geological sample. FIG. 13B shows the results of the MCR- 
20 ALS analysis of the EDX image compressed to the level four approximation. 
FIG. 13C shows the level four compressed image of the silicate component. 
FIG. 13D shows the silicate image reconstructed from the compressed image of 
the silicate component. 

FIG. 14 shows scaled eigenvalues for the sorted principal components of 
25 the uncompressed image of the geological sample and of four levels of 
compression using Haar wavelets. 

FIGs. 15A and 15B show MCR-ALS analysis of an alumina/braze 
interface. FIG. 15A shows the distributions of the glass and alumina phases 
obtained at full spatial resolution. FIG. 15B shows the corresponding 
30 components obtained after 7 levels of compression along the row dimension and 
no compression along the column dimension. 
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DETAILED DESCRIPTION OF THE INVENTION 

As problem sizes begin to test the limits of available computing resources, 
it becomes necessary to pay close attention to the details an algorithm's 
implementation in order to achieve optimal or even acceptable performance. 
5 How an algorithm is structured with respect to the computer system's memory 
architecture is perhaps the foremost influence in this regard. 

In the present invention, by paying careful attention to data locality, 
efficient block algorithms can be devised for performing common operations on 
data sets. The block algorithms enable the analysis of larger-than-memory data 

10 sets and the improved performance for data sets that are small enough to fit 
entirely within memory. Contrary to conventional wisdom, for linear algebra 
operations typical of chemometrics, movement between disk storage and main 
memory can be minimized such that disk access time can be a small part of the 
total computation time for the analysis of large data sets. Alternatively, 

15 performance can be improved for smaller data sets by minimizing the movement 
between main memory and cache memory. To demonstrate these ideas, a block 
algorithm has been designed and implemented that enables the more efficient 
use of a computer's memory to push back apparent memory limitations. 

In FIG. 1 is shown the memory hierarchy of a typical personal computer, 

20 where memory is viewed expansively to include any stored data that could be 
made available to the processor. In general, speed goes up but capacity goes 
down as we move up the hierarchy toward the processor. As a problem size 
grows, we are forced to move down the hierarchy to gain access to sufficient 
memory, at the cost of losing performance. See K. Gallivan et a/., "Impact of 

25 Hierarchical Memory Systems on Linear Algebra Algorithm Design," Int. J. 
Supercomputer Applications 2. 12 (1988). 

To make efficient use of the memory hierarchy, it is necessary to design 
an algorithm with data locality in mind. Two flavors of locality are generally 
considered. Temporal locality relates to the tendency for a given memory 

30 location to be accessed repeatedly over a short period of time. Utilizing spatial 
locality, on the other hand, means designing an algorithm to access nearby 
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memory locations together. In terms of the memory hierarchy, exploiting locality 
and designing for performance means minimizing data flow between different 
levels of the hierarchy while trying to perform as many operations as possible in 
the higher levels. These ideas have been critical in obtaining high performance 
5 in distributed computing environments where inter-processor communication 
poses a bottleneck, but they have been paid relatively scant attention by users of 
stand-alone computers. 

That locality is important in the latter environment, however, can be easily 
demonstrated by a simple example. Consider programming in the popular 

10 MATLAB® environment. See MATLAB Version 6.5, The Mathworks, Natick, MA. 
MATLAB® stores matrices in column-major order, as depicted by matrix A in FIG. 
2. Thus, matrix elements that are adjacent in a column are, in fact, adjacent in 
memory, whereas, elements that are adjacent in a row are separated in memory 
by the number of rows in the matrix. To make the most of spatial locality, in this 

15 case, algorithms should be designed to operate on columns of the matrix rather 
than on rows. 

FIG. 3 shows two MATLAB® code fragments. These scale the rows and 
columns of a matrix, respectively, by the elements of a vector. Using a Windows 
2000-based computer with a 1 GHz Pentium III processor, 1 Gbyte of main 

20 memory and a 256-Kbyte L2 cache, the row-oriented algorithm completed in 12.5 
sec, whereas the column-oriented algorithm took only 1 .8 sec. Note that exactly 
the same mathematical operations are being performed in both cases. The 
improvement in performance by almost a factor of 7 can be attributed to good 
utilization of spatial locality in the column-oriented algorithm. Since the adjacent 

25 elements in a row of A are separated by a distance that is greater than the cache 
size, the row-oriented algorithm makes very poor use of cache memory and 
requires many more accesses to the slower main memory, resulting in degraded 
performance. 

The preceding example and discussion were concerned with performance 
30 in the case that a problem is too large to be contained fully in cache memory, but 
which did fit within the computer's main memory. As problems become too large 
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to be contained in main memory, the next level in the memory hierarchy, the local 
disk storage, comes into play. Data analysts have some experience with using 
disk storage to mimic main memory in the virtual memory capability of modern 
operating systems. This experience has invariably been negative from the 
5 standpoint of numerically intensive computation. Once the operating system 
begins "swapping to disk," the speed of numerical algorithms is brought to a 
crawl. However, it should be kept in mind that the disk usage, in this case, is 
being controlled by the operating system - not by the algorithm. By explicitly 
considering disk usage when designing large-scale algorithms, rational choices 

10 about disk usage can be made up front and disk-based memory can provide an 
effective solution to the larger-than-memory problem. The design considerations 
when using disk storage are the same as the ones discussed previously - 
minimize the flow of data between levels of the memory hierarchy and perform as 
many calculations as possible at the highest level possible. 

15 A key to making efficient use of disk storage is the development of block 

algorithms for common linear algebra operations. See G. H. Golub and C. F. 
Van Loan, Matrix Computations, 3 rd Ed., The John Hopkins University Press, 
Baltimore, MD (1996). Block techniques have played an important role in 
designing high performance algorithms both on hierarchical uniprocessor 

20 systems and in parallel computing environments. 

The block algorithm can be combined with data compression algorithms to 
enable the efficient analysis of very large multivariate images. This approach will 
be illustrated in the following description by the development of a block algorithm 
for performing PCA that is suitable for use in applications in which the entire data 

25 set cannot be contained simultaneously within main memory. When performing 
PCA with the block algorithm, the algorithm is capable of producing a 
mathematically exact factorization of a data matrix into scores and loadings. 
Substantial speed improvements can then be achieved by truncating the 
factorization to the number of chemically relevant factors either known a priori or 

30 estimated from the data. The algorithm can also accommodate several common 
preprocessing operations such as mean-centering, autoscaling, and weighting by 
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a diagonal or block-diagonal matrix. The block algorithm for PCA can also be 
used with the multiresolution spatial processing either pre- or post-factorization. 

Multivariate Spectral Analysis 
Multivariate spectral analysis can be applied to spectral data produced by a 
5 variety of spectroscopic imaging techniques, including: Electron Probe 
Microanalysis (EPMA), Scanning Electron Microscopy (SEM) with attached 
Energy Dispersive X-Ray Spectrometer (EDX), X-Ray Fluorescence (XRF), 
Electron Energy Loss spectroscopy (EELS), Particle Induced X-ray Emission 
(PIXE), Auger Electron Spectroscopy (AES), gamma-ray spectroscopy, 
10 Secondary Ion Mass Spectroscopy (SIMS), X-Ray Photoelectron Spectroscopy 
(XPS), Infrared Spectroscopy (IR), Raman Spectroscopy, Magnetic Resonance 
Imaging (MRI) scans, Computerized Axial Tomography (CAT) scans, IR 
reflectometry, Mass Spectrometry (MS), multidimensional 
chromatographic/spectroscopic techniques, hyperspectral remote imaging 
15 sensors, etc. 

In general, an image can comprise any arbitrary, multidimensional array of 
points. The image can include a spatial dimension, such as lines, traditional 2D 
images, or 3D volumes; or a temporal dimension, such as a time series of 
images. In the case of a gas chromatography / mass spectroscopy (GC/MS) 

20 image, the dimension is a separation coordinate. The multivariate image 

analysis techniques will be described herein in reference to a spatial dimension. 
However, it is understood that the techniques can be applied also to non-spatial 
images, such as those comprising a time series or chromatographic coordinate. 
In general, multivariate spectral analysis for chemical characterization of a 

25 sample can include: determining the number of chemical species (pure elements 
and chemical phases or alloys) that comprise an inhomogeneous mixture being 
imaged; extracting the spectra of these "pure" components (elements or phases); 
quantifying the amount or concentration of each component present in the 
sample; and mapping the spatial distribution of these components across the 

30 sample. 
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Multivariate spectral analysis can be performed on a full spectrum image 
that can be represented as a two-dimensional data matrix D. A two-dimensional 
data matrix D can be obtained by unfolding a measured multidimensional 
spectral data set D . For example, the multidimensional spectra data set D can 
5 be a data cube that comprises a 1 D spectrum at each pixel on a 2D spatial grid 
corresponding to the (X,Y) coordinates of the pixel's location. The 2D data 
matrix D enables the easy and efficient use of standard linear algebra and matrix 
operations. 

The data matrix D can be factored into the product of two matrices, C and 
10 S T , according to: 

D = CS T (1) 

where D has dimensions of m x p, and m is the number of pixels and p is the 
number of spectral channels. The matrix C is a concentration matrix, which is 
related to the concentration of the chemical phases (e.g., a matrix representing a 

15 map of the component abundances) and has dimensions of m x qr, where q is the 
number of pure components. The matrix S is a spectral shapes matrix, which 
contains information about the spectral shapes of the pure chemical components 
(e.g., a matrix of the pure component spectra). S has dimensions p x qr. 

The factorization of Eq. (1) can be accomplished by an image analysis of 

20 D. A number of prior image analysis methods are described in U.S. Patent No. 
6,675,106 to Keenan and Kotula and U.S. Patent No. 6,584,413 to Keenan and 
Kotula, which are incorporated herein by reference. A preferred image analysis 
method comprises a constrained MCR-ALS analysis of Eq. (1 ). A variety of 
constraint conditions can be used. For example, the constraint condition can be 

25 a non-negativity constraint. Alternatively, the constraint condition can be chosen 
to track a. physical attribute. The concentrations C can be constrained to be 
either monotonically increasing or decreasing over time, for example, when 
monitoring a chemical reaction over time. Alternatively, the concentrations C or 
spectral shapes S can be constrained to be unimodal. Alternatively, the spectral 

.30 shapes S can be constrained to a fixed number of spectral peaks (e.g., a single 
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peak for chromatography, or, to less than three peaks). Alternatively, the 
spectral shapes S can be constrained to match the shape of a Gaussian 
distribution. The factors can also be partially constrained. For example, the 
concentration of a particular species can be zero at a particular location and 
5 unconstrained elsewhere. 

In an exemplary MCR-ALS analysis, non-negativity constraints can be 
applied to the chemical concentrations and spectral intensities. One overall 
procedure for applying non-negativity constraints is described by R. Bro and S. 
De Jong, "A Fast Non-Negativity-Constrained Least Squares Algorithm", Journal 
10 of Chemometrics 11_, 393 (1997), which is incorporated herein by reference. 

In a non-negativity constrained MCR-ALS analysis, an initial feasible 
estimate can be made for S (Bro starts with all zeros as an initial feasible 
solution). Eq. (1 ) can then be solved for C under the non-negativity constraint: 

mjnjD - CS t || f , subject to C > 0. (2) 

15 where F represents the Frobenius norm, which is computed as the square root of 
the sum of the squared elements of a matrix. Given the first estimate of C 
obtained by solving Eq. (2), Eq. (1) can be solved for S under the non-negativity 
constraint: 

min||D - CS t || f , subject to S > 0. (3) 

20 Next, a convergence metric can be computed and compared to a convergence 
criterion. If the convergence metric is less than the convergence criterion, then 
convergence has been achieved and the procedure is completed. If not, then the 
steps are repeated (/.e. f iterated) as many times as is needed until an acceptable 
(i.e., sufficient) level of convergence is achieved or until a fixed number of 

25 iterations have been completed. In general, the goal of the MCR-ALS analysis is 
to achieve a least squares solution to Eq. (1) by testing for convergence, 
according to: 

minlb - CS T || , subject to C > 0 and S > 0. (4) 

C,S II HF 
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The previous discussion assumed that the MCR-ALS analysis started by 
making an initial estimate for S and then solving Eq. (2) for C, and so on. 
However, the roles of C and S can be easily interchanged by making an initial 
feasible estimate for C and first solving Eq. (3) for S. 
5 If desired, the measured data matrix D can be weighted, depending on the 

type of experiment being analyzed, and depending on the properties of the noise 
or background signal generated during data acquisition. Weighting is generally 
used whenever the properties of the noise are not uniform throughout the 
measurement space (i.e., heteroscedastic noise). This is particularly true in the 

10 case of "counting" experiments in which the noise is characterized as following a 
Poisson probability distribution in which the magnitude of the uncertainty varies 
with the magnitude of the signal. For multi-channel data acquisition the noise is 
not characterized by a single probability distribution, but rather, by a distribution 
whose parameters will, in principle, differ from channel to channel and from pixel 

15 to pixel within a channel. Additionally, heteroscedasticity can also arise from 
other effects, e.g., non-uniform detector responses, or mathematical 
transformations applied to the data (e.g., taking logarithms). 

Weighting is also useful when there is a large disparity between the total 
number of counts (e.g., observations) arising from different elements or phases 

20 (e.g., a sample comprising a small amount of contaminant located on top of a 
substrate made of a single material). Weighting, therefore, is useful for 
accurately identifying minor phases, trace elements, or subtle gradients in 
composition across a sample. By properly accounting for experimental noise 
characteristics, chemically relevant features having small number of counts can 

25 become more significant, in a least squares sense, than large magnitude noise 
associated with major spectroscopic features. 

Weighting can also be used to account for data "outliers". Data outliers 
can include malfunctioning energy channels or pixel elements in a detector array. 
For example, a dead (i.e., inoperative) energy channel (e.g., zero signal) can be 

30 effectively removed from the data matrix D by assigning a sufficiently small 

weight. Alternatively, for detectors that use a CCD pixel array, an occurrence of 
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a "hot" (i.e., saturated) or dead pixel can be weighted in a similar fashion, 
namely, by assigning a sufficiently small weight. 

Data matrix D can be weighted to create a weighted data matrix D 
according to: 

5 D = GDH (5) 

where G is a pre-multiply weighting matrix having dimensions m x m\ and H is a 
post-multiply weighting matrix having dimensions p x p. In general, G is used to 
weight the row-space of D, and H is used to weight the column-space of D. 
Obviously, the weighting matrices G and H can be the identity matrix if no 

10 weighting is desired. 

The matrix G can be used to account for unequal variance in the 
observations from pixel-to-pixel, independent of channel number (/.e., energy or 
wavelength). For example, weighting by G could be used in the case where 
there are hot or dead pixels, or if there was an array detector where different 

15 detector elements had different noise properties. Weighting by G also could be 
used in the situation where there were much higher total counts in some pixels 
as compared to others, caused, for example, by unequal dwell times. 

The matrix H can account for unequal variance in the observations from 
channel-to-channel, independent of pixel location. For example, weighting by H 

20 could be used when the noise level varies from one energy channel to the next. 
For X-ray detection in spectral analysis, the noise will be higher on average in 
channels that have greater signal on average due to the Poisson nature of a 
counting experiment. 

If the data matrix D is weighted, the test for convergence of the 

25 constrained ALS solution is: 



mm 



GDH - GCS t H|| f = min p - CS t || f (6) 



where D is the weighted data matrix, C = GC is the weighted concentration 
matrix, and S T = S T H is the weighted spectral shapes matrix. Eq. (6) can be 
solved with appropriate constraints applied to C and S T , as described above. 

15 
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After estimating the weighted concentrations and spectra, the corresponding 
unweighted concentrations and spectra can be recovered as C = CG" 1 and 

S T = S T H" 1 . For simplicity, the analysis of the unweighted data matrix D will be 
described hereinafter, although it will be understood that the method of the 
5 present invention can also be applied to the weighted data matrix D . 

Spectral Compression using a Block PCA Algorithm 
The present invention is directed to spectral compression algorithms for 
the analysis of very large multivariate images. The spectral compression 
algorithms can be combined with spatial compression algorithms for additional 

10 computational efficiencies. The spatial compression algorithms, described 
below, are the subject of the related U.S. Patent Application SD7213, "Spatial 
Compression Algorithms for the Analysis of Very Large Multivariate Images." 

The prior MCR-ALS image analysis method described above, using the 
full data matrix D, the concentration matrix C, and the spectral shapes matrix S, 

15 requires a large amount of memory and computation time. For example, the 
computation time of C T D in Eq. (3) is proportional to m x p x q. 

The computational complexity can be reduced through spectral 
compression of the data matrix using PCA. PCA is one of the core techniques of 
multivariate statistical analysis and it has been employed in numerous and 

20 diverse applications including dimensional reduction, data compression, 

exploratory data analysis, factor analysis, pattern recognition, classification, and 
multivariate calibration. In particular, PCA can be used in the analysis of 
hyperspectral image data. See P. Geladi and H. Grahn, Multivariate Image 
Analysis, Wiley, Chinchester, UK (1996), which is incorporated herein by 

25 reference. 

The goal of PCA is to extract the useful information in a high-dimension 
data set into a lower dimension subspace. From a geometric point of view, PCA 
begins by finding that single direction in the p-dimensional space that best 
describes the location of the data. The vector describing that direction is the first 
30 principal component. Once found, a second direction, orthogonal to the first, is 
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determined that best accounts for the variation in the data that is orthogonal to 
the first. This is the second principal component. The process continues with 
each new principal component maximally accounting for the variation in the data 
that is orthogonal to all preceding components. The first few principal 
5 components will contain the chemical information of interest. If there are r such 
components, the remaining p - r components are assumed to describe 
experimental noise or error. Limiting further analysis to the r-dimensional 
subspace defined by the first r principal components provides the desired 
dimensional reduction and data compression. 
10 In matrix terms, PCA is concerned with factoring a data matrix D into the 

product of two other matrices, a scores matrix T whose columns are mutually 
orthogonal and a matrix P of orthonormal loading vectors, according to: 

D = TP T (7) 

To take a spectroscopic example, the loading vectors P describe the spectral 
15 characteristics of the chemical constituents of a sample and the scores T are 
related to their concentrations. 

PCA is closely related to the SVD, which is often used to compute the 
PCA. SVD performs the factorization: 

D = USV T (8) 

20 If D is an m x p matrix, then U and V are m x m and p x p orthogonal matrices, 
respectively, and £ is an m x p diagonal matrix containing the singular values 
along the diagonal, ordered by decreasing size. The right singular vectors V 
provide abstract representations of the spectra of the individual chemical 
components (e.g., elements or phases). The min(A77, p) singular values are 

25 related to the amount of variance in the data that is accounted for by the 

corresponding principal components. Specifically, the t h singular value is equal 
to the square root of the variance accounted for by the P principal component. 
The diagonal form indicates that the transformed data are uncorrelated. By 
decomposing the data into a set of uncorrelated factors of decreasing statistical 
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significance, data compression can be accomplished by selecting those factors 
having the greatest statistical significance and discarding the rest as noise or 
error. 

SVD has the useful property that the space spanned by the first r columns 
5 of V represents, in a least squares sense, the best rank r approximation to the 
space spanned by the rows of D. The remaining p - r columns of V represent 
experimental noise or error and can be discarded. Thus, the scores and loading 
matrices can be truncated, or compressed, to contain only those vectors 
corresponding to significant singular values. Letting V r be the matrix whose 
10 columns are the first r columns of V, a r-component PCA model can then be 
computed, according to: 

P = V r (9) 

f = (UZ) f -DP (10) 

where Eq. (1 0) follows from the orthonormality of the columns of V r . Therefore, 

15 f is the spectrally compressed scores matrix and P is the spectrally 
compressed loadings matrix. 

In the case of large spectral images, SVD becomes a very inefficient way 
to compute the principal components. As described below, in the typical case 
that the number of pixels, m, is greater than the number of spectral channels, p, 

20 a better approach is to consider the p x p crossproduct, or kernel matrix D T D. 
Those skilled in the art will appreciate that the methods of the present invention 
can also be applied to cases where p > m, in which case the crossproduct matrix 
would be computed as DD T . See W. Wu, D. Massart, and S. de Jong, The 
kernel PCA algorithms for wide data. Part I: theory and algorithms", 

25 Chemometrics Intell. Lab Svst . 36, 165 (1997), which is incorporated herein by 
reference. 

Starting with Eq. (8), the crossproduct matrix can be written 

D T D = VZ T U T UEV T = V(E T S)V T = VEV T (1 1 ) 
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Eq. (1 1 ) has the form of a standard symmetric matrix eigenvalue problem, which 
is readily solved by standard methods. E, in this equation, is a p x p diagonal 
matrix of eigenvalues sorted in descending order. Given V, the PCA model can, 
once again, be computed according to Eqs. (9) and (10), where only the first r 

5 columns of V are used to compute the compressed loading matrix P and the 

compressed scores matrix f . 

Alternatively, the data may be presented as an arbitrary f-component 
factor model according to: 

D = AB T (12) 

10 where A and Baremxf and p xf data factor matrices, respectively. Therefore, 
the data factor matrix A comprises the spatial information and the data factor 
matrix B comprises the spectral information. The f most significant eigenvalues 
E f of D T D can be obtained from the solution to the generalized symmetric 
eigenvalue problem 

15 (A T A)(B T B)Y = EfY (13) 

and the corresponding f most significant eigenvectors of D T D can be computed, 
according to: 

Vf = BY (14) 

A r-component PCA model for D, given r< f y can then be computed, according 
20 to: 

P = BY r (15) 

f = A(B T B)Y r (16) 

where the Y f is the matrix whose columns are the first r columns of Y. 

While the eigenvectors V include all of the information contained in the 
25 pure spectral components, they do not do so in a chemically recognizable form. 
Therefore, a single principal component will not, in general, represent either a 
pure element or a multi-element phase, but rather, a linear combination of such 
elements or phases. In other words, there may not be a one-to-one 
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correspondence between a selected principal component and a particular 
chemical phase or pure element. For example, physically admissible 
concentrations must be non-negative, and the pure spectral components must be 
greater than the background signal, whereas a general principal components 
5 analysis need not be thus constrained. The principal components produced by 
PCA often have negative values, which presents a spectrum that is difficult for 
the practicing analyst to understand. Additionally, a major disadvantage of PCA 
is that the principal components are constrained to be orthogonal, while any 
chemical phase that contains overlapping spectral peaks will necessarily be non- 
10 orthogonal. Therefore, subsequent post-processing of results obtained from 
PCA is useful for transforming the abstract components into physically 
meaningful factors. 

According to a method of the present invention, the principal components 
can be transformed into more physically realizable spectra by performing an 
15 MCR-ALS analysis using the PCA-factored representation of the data. The PCA- 
factored representation, TP T , can be used in place of the data matrix, D, in Eq. 

(1), according to: 

TP T =CS T (17) 

Factorization can then be accomplished by performing a constrained MCR-ALS 
20 analysis of Eq. (17). As described above, an initial feasible estimate can be 
made for S. C can then be computed, subject to constraints, according to: 



mm 

c l 



TP T -CS T 



(18) 



S can then be computed, subject to constraints, according to: 



m s in||TP T -CS T || F (19) 

25 Eqs. (18) and (19) can be iterated until a sufficient level of convergence is 

achieved. The roles of C and S can be easily interchanged by making an initial 
feasible estimate for C and first solving Eq. (19) for S. In general, a least 
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squares solution is achieved by testing for convergence, subject to constraints, 
according to: 



mm 

c,s 



TP T -CS T || (20) 



This substitution provides chemically recognizable solutions for the concentration 
matrix C and the spectral shapes matrix S. 

Preferably, the MCR-ALS analysis can be performed using the spectrally 
compressed, PCA-factored representation of the data, subject to constraints, 
according to: 



mm 

c.s 



TP T -CS T 



(21) 



10 Such an MCR-ALS computation using the r-component PCA factor model in 
place of the full spectral data matrix saves substantial memory and computation 
time. The number of elements required to describe the compressed data is (m + 
p) x r, rather than mxp elements in the raw data set, resulting in a considerable 
savings in computation time in the typical case that m, p » r. 

15 While this approach is numerically more efficient than direct computation 

via SVD, the equations, as written, may not make the most effective use of cache 
memory and may still require the entire data set D to reside in memory in order to 
form the requisite matrix products. The key to developing an efficient algorithm 
for PCA that is suitable for larger-than-memory data sets is the realization that 

20 both the formation of the crossproduct matrix in Eq. (1 1 ) and the projection of the 
data onto the loading vectors in Eq. (10) can be accomplished in a blockwise 
manner. 

Considering the case, once again, that m > p, D and T (or f ) can be 
written as conformable block matrices: 
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D = 







" V 


D 2 








and T = 








Ty-1 









(22) 



For illustration, we will assume that each of the j blocks has the same number of 
rows, m/j, but this is not a requirement in practice. The crossproduct matrix now 
becomes: 



d t d = [d; D}., D}] 



D, 



"y-1 

D; 



(23) 



f=1 



Eq. (23) shows that the crossproduct matrix D T D can be accumulated through a 
series of matrix-matrix multiplications involving the blocks of D. Both the block 
crossproduct matrix D[D,and the crossproduct matrix D T D are of size p x p. 
However, rather than hold the entire data matrix D f of size m x p, in memory, it is 

10 only necessary to hold a block data matrix D,, of size m/j x p, in memory in order 
to compute the crossproduct matrix D T D. This can be a significant memory- 
saving advantage for images having many pixels. The requirement is simply that 
the D, are suitably sized to fit within main memory of the computer. As will be 
obvious to those skilled in the art,y can be equal to one and the methods of the 

15 present invention can be applied equally well to unblocked data. 

After computing V and P in the standard way, a scores block can be 
constructed by projecting the data block D/onto P, according to: 

T, = D,P (24) 

Preferably, a compressed scores block T, can be constructed using the block 

20 version of Eq. (10). It is at this point that the larger-than-main-memory algorithm 
pays a penalty, since the data blocks D, have to be read from disk a second time. 
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The magnitude of the penalty depends on the details of the disk subsystem, total 
system memory, operating system file caching strategy, etc. However, 
experience has shown that the penalty is small in comparison to the overall 
computation time for large spectral image data sets. 
5 Those skilled in the art will appreciate that the factorization can be applied 

to the case that there are more spectral channels than pixels (i.e., p > m). In this 
later case, the data block D, would consist of the full spatial data at some number 
of spectral channels. The crossproduct matrix would be computed as DD T , and 
the roles of T and P would be reversed. That is, the eigenvectors would be 
10 associated with a single T matrix and P T would be computed blockwise, 
according to: 

^ T = T T D ; (25) 

The T matrix would have orthonormal columns and the columns of P would be 
orthogonal. The commonly assumed case of orthogonal columns for T and 
15 orthonormal columns for P could be achieved by applying a simple normalization 
to P and the inverse normalization to T. 

Data presented in a factored form according to Eq. (12) is also compatible 

with a block algorithm. A and f can be written as conformable block matrices: 



"A," 




V 


A 2 




% 




and f = 




Ay-1 




Ty-i 


.V 




_V 



20 For illustration, we will assume that each of the j blocks has the same number of 
rows, m/y, but this is not a requirement in practice. The crossproduct data factor 
matrix A T A now becomes: 
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A, 
A, 



"7-1 
A, 



= ZAJA, 



(27) 



;=1 



Eq. (27) shows that the crossproduct data factor matrix A T A can be accumulated 
through a series of matrix-matrix multiplications involving the blocks of data factor 
matrix A. Both the block crossproduct matrix AjA, and the crossproduct matrix 
A T A are of size fx f. 

Likewise, B and P can be written as conformable block matrices: 



B = 







P, 


B 2 




P 2 




and P = 




B^ 




P,-i 


.B. . 







(28) 



For illustration, we will assume that each of the k blocks has the same number of 
rows, plk, but this is not a requirement in practice. The crossproduct data factor 
matrix B T B now becomes: 



B T B = [B| BJ - BL Bl] 



B, 
B 2 

B^ 
B, 



= ZBJB ( . 



(29) 



/=1 



Eq. (29) shows that the crossproduct data factor matrix B T B can be accumulated 
through a series of matrix-matrix multiplications involving the blocks of data factor 
matrix B. Both the block crossproduct matrix B/B, and the crossproduct matrix 

B T B are of size f x f. 

After computing Y in the standard way as the solution to the generalized 
symmetric eigenvalue problem in Eq. (13), a block of loading vectors can be 
constructed, according to: 
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P,= B/Y f 



(30) 



and a scores block can be constructed, according to: 



f ; = A/(B T B)Y f 



(31) 



Furthermore, the factored representation set can be transformed, using a 
wavelet transform, and spatially compressed, as described below. If a 

transformed and spatially compressed data factor matrix A is used, a 

transformed and spatially compressed scores block T, is provided by Eq. (31). 

In many applications, it is necessary to preprocess the raw data in some 
manner prior to performing PCA. If it is desired to work with the data's 
covariance matrix or correlation matrix rather than its crossproduct matrix, for 
instance, the raw data must be mean-centered and/or variance-scaled. It may 
also be necessary to weight the data to account for non-uniform noise 
characteristics. Spectroscopic techniques that rely on photon counting, for 
example, need to scale the data to reflect the fact that the estimated variance for 
a given data element is equal to the data element itself. In all of these cases, it is 
possible either to perform the operation blockwise as the data is read, or to 
construct a transformation of the crossproduct matrix that achieves the same 
result. 

Several possible transformations of the crossproduct matrix are discussed 
in the book by Geladi and Grahn. For example, given a crossproduct matrix Z = 
D T D, the covariance and correlation matrices, Z cov and Z CO r, can be computed, 
respectively, by 




(32) 



and 




-1 



(33) 
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In these equations, d is a vector whose elements are the column-wise means of 
D. Given s as the corresponding vector of column-wise standard deviations, the 
diagonal matrix S is constructed by placing the elements of s along its diagonal. 
Clearly, Z cov and Z cor can be computed after the crossproduct matrix has 

5 been assembled if d and s are available. This is readily accomplished in the 
block algorithm by accumulating the appropriate sums as the raw data is read. If 
a row of D, dT, represents a single multivariate observation (e.g., the spectrum at 

a single pixel), then d and s are computed according to 



with all vector operations being performed element-wise. The sum of the data 
elements in Eq. (34) and the sum of the squared data elements in Eq. (35) can 
be accumulated as the individual spectra are read. The mean and standard 

15 deviation can then be computed subsequent to reading all of the spectral data. 
The covariance matrix Z cov or correlation matrix Z cor can then be used to 
compute eigenvectors and eigenvalues in an eigenanalysis, as described above. 

When the data is weighted, equivalent weighting can be applied to the 
crossproduct, if the weighting matrices are block diagonal in a way that is 

20 conformable with the way D is blocked. Typically, weighting matrices are chosen 
to be diagonal so this restriction poses little practical difficulty. In the general 
case that both the row-space and the column-space of D are to be weighted, the 

weighted data matrix D is given by Eq. (5). For m > p, the crossproduct of the 
weighted data is computed as 




(34) 



10 and 




(35) 



25 




(36) 
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10 



15 



Assuming G is blocked conformably with D: 



D = 









D 2 




G 2 0 




and G = 








0 G, 


_■>/_ 







(37) 



row weighting in Eq. (5) can be performed on-the-fly as 

d t g t gd = £d[gJg,d, 



(38) 



The G, can either be known a priori or can be computed from the corresponding 
data blocks D,. In this particular case, column weighting is applied through a 
direct transformation of the row-weighted crossproduct and no restrictions need 
to be placed how the matrix H is blocked. 

In the case that the data is presented in the factored form according to Eq. 
(12), the data factor matrices A and B can be preprocessed prior to generalized 
eigenanalysis to achieve results equivalent to those obtained through 
eigenanalysis of preprocessed raw data. For example, scores and loading 
vectors corresponding to an analysis of the data covariance matrix can be 
obtained from the solution to the generalized symmetric eigenvalue problem: 

1 



(a t A - m aa T )(b t b) = EY 



(39) 



according to: 



P = BY, and 



(40) 



In these equations, a is a vector whose elements are the column-wise means of 
20 a and 1 m is an m-vector of ones. 



27 



SD7276 
Keenan 



Clearly, Eqs. (39) and (40) are compatible with the block algorithm since 
a can be accumulated as the blocks of A are read. If a] is a row of A, then 



a = 



( 1 V2, 

\rnjM 



(41) 



Likewise, scores and loading vectors corresponding to an analysis of the data 
correlation matrix can be obtained from the solution to: 



(a t A -maa T Yb t S- 2 b) = EY 
(m-l) v A ; 



(42) 



according to: 



P = S BY, and 



(43) 



= JS (A - 1 ^ IT)(BTS " 2BK 



10 Eqs. (42) and (43) can be implemented with block algorithms by blocking the 
diagonal matrix S conformably with B 



B = 



Then, 



B 1 








B 2 




S 2 0 






and S = 




(44) 






o s k _i 




.B„_ 










B T S" 2 B = 


ZBJS: 2 B,. 


(45) 



1=1 



15 Assuming the blocks of A are read first, A T A and a have been computed, and 
is a row of B, the elements of the diagonal matrix S can be computed as the 
blocks of B are read, according to: 



c _ | bjA T Ab,.-m(bJat 
Sii= i m-1 



(46) 
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Thus, the block S, depends only on the block B,- and the requisite block matrix 
multiplications can be performed independently and accumulated according to 
Eq. (45). 

If the data is to be weighted, weighted scores and loading vectors can be 
5 obtained from the solution to generalized symmetric eigenvalue problem: 

(A T G T GA)(B T H T HB)Y = EY (47) 

according to: 

P = HBY r and (48) 
T = (GA)(B T H T HB)Y f 

10 Once again, the block algorithm can be applied in the typical case that the 

weighting matrices G and H are block diagonal in a manner that is conformable 
with A and B, respectively, according to: 

A T G T GA = £ AjGjG.A, and B T H T HB = £ BjHjHB, (49) 

The G/ and H, can be known a priori, can be read from header information stored 
15 with the factored data, or can be derived from statistical summaries of the data 
computed from A and B. 

FIG. 4 illustrates a spectral compression method of the present invention 
for raw or spatially compressed data using the block PCA algorithm for the case 
that there are more pixels than spectral channels (i.e., m > p). The method 
20 comprises a series of sequential steps that can be performed by a computer. 

At step 11, the data matrix D, comprising measured spectral data, is 
provided in disk storage. The data matrix D can comprise ; blocks of data D„ 

according to Eq. (22). Alternatively, a spatially compressed data matrix D 

comprising blocks of compressed data D, can be provided, as described below. 
25 At step 12, a block of data D, is read from disk a first time into the 

computer memory. The block is suitably sized to fit in core memory and consists 
of the full spectral data at some number of pixels. 
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If a covariance matrix is to be computed, the rows of D, can be added to 
an accumulation of the row vectors Z d, . Alternatively, if a correlation matrix is to 
be computed, the squares of the rows of D, can be added to an accumulation of 
the squared row vectors £d* . The data blocks can also be row weighted as 
5 shown in Eq. (38). 

The block crossproduct matrix DjD, is computed. The block crossproduct 
matrix DjD, is added to an accumulation of the block crossproduct matrices 
£ D[D, . If all j of the data blocks have not been read in, another data block is 
read in, another block crossproduct matrix is computed and added to the 
10 accumulation until all of the block crossproduct matrices have been accumulated, 
according to Eq. (23), to provide the crossproduct matrix D T D. 

If the crossproduct matrix is to be transformed, the column-wise means of 

the data matrix d can be computed from the accumulated rows Zd, , according 
to Eq. (34). The covariance matrix Z cov can then be computed, according to Eq. 
15 (32). Alternatively, the column-wise standard deviations of the data matrix s can 
be computed from the accumulated squared rows £d* , according to Eq. (35). 

The correlation matrix Z cor can then be computed, according to Eq. (33). If the 
data is to be column-weighted, the weighted crossproduct matrix can be 
computed according to Eq. (36). 
20 At step 13, an eigenanalysis of the crossproduct matrix D T D is performed. 

The eigenvectors V and eigenvalues E of the crossproduct matrix D T D are 
calculated, according to Eq. (1 1 ). Alternatively, the eigenvalues of the 
covariance matrix Z cov , or correlation matrix Z cor can be computed and sorted. 
If spectral compression of the PCA factors is desired, a best rank r can be 

25 determined from the eigenvalues E. A ranked loadings matrix P can be 
constructed from V based on the number of factors r desired in the model, 
according to Eq. (9). 

At step 14, a scores block T, is computed by re-reading a data block D, 
and computing the product of D, and P, according to Eq. (24). When using the 
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ranked loadings matrix P , a ranked block T, of the scores matrix can be 
computed. 

At step 15, a PCA-factored representation of the data is provided, 
comprising the conformable block scores matrix T, obtained by accumulating the 
5 scores blocks T,, according to Eq. (22); the loadings matrix P; and the 

eigenvalues E. Preferably, the ranked scores and loading vectors matrices f 

and P are truncated to contain only those vectors corresponding to the r 
significant eigenvalues. 

Alternatively, a data set may be provided wherein the data matrix D has 

10 already been factored into the product of two matrices, according to Eq. (12). 
Such factorization can be done on the full data matrix on a large computer or 
blockwise on a smaller computer using the algorithms presented herein. The 
factorization can also be effected in an adaptive manner, in which case the full 
data matrix may never be available, even in principle. Those skilled in the art will 

15 recognize that the methods of the present invention can be applied to any 

factored representation of the data, such as spectrally compressed scores and 
loading matrices, rotated PCA factors, MCR factors, etc. 

In FIG. 5 is shown a method of the present invention for obtaining the 

spectrally compressed scores matrix T and loading matrix P from the data factor 
20 matrices A and B. For example, the data factor matrices A and B can be an 

uncompressed scores matrix T and an uncompressed matrix P of loading 

vectors. The method comprises a series of sequential steps that can be 

performed on a computer. 

At step 21 , the data factor matrices A and B are provided. These matrices 
25 can be written as conformable block matrices, as in Eqs. (26) and (28). 

Therefore, the data factor matrix A can comprise ; blocks of data factors A, and 

the data factor matrix B can comprise k blocks of data factors B,. 

At step 22, a block of data factors A/ is read from a disk into the computer 

memory. The block is suitably sized to fit in core memory. The transpose of the 
30 data factor block AT is computed and the block crossproduct data factor matrix 
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AjA, is computed. The block crossproduct data factor matrix A]A i is added to 
an accumulation of the block crossproduct data factor matrices £ A^A, . If all j of 
the data factor blocks have not been read in, another data factor block is read in, 
another block crossproduct data factor matrix is computed and added to the 
5 accumulation until all of the block crossproduct data factor matrices have been 
accumulated, to provide the crossproduct data factor matrix A T A, according to 
Eq. (27). Similarly, a block of data factors B, is read in, a block crossproduct data 
factor matrix BTB, is computed, and the block crossproduct data factor matrices 
are accumulated to provide the crossproduct data factor matrix B T B, according to 
10 Eq. (29). 

At step 23, an eigenanalysis of the crossproduct matrix A T A x B T B is 
performed, according to Eq. (13). The generalized eigenvectors Y and 
eigenvalues E of the crossproduct matrix are calculated. The most significant 
eigenvectors Mf of D T D can be determined, according to Eq. (14). 
15 At step 24, a r-component PCA model for D, given r< f, can then be 

computed according to Eqs. (15) and (16). 

At step 25, an r-component, PCA-factored representation of the data is 
provided, comprising the ranked scores matrix f , the ranked blocked loadings 

matrix P , and the r significant eigenvalues E. 

20 To obtain chemically recognizable solutions for the concentration and 

spectral shapes matrices, image analysis can be performed on the PCA-factored 
representation of the data. Spectral image analysis techniques suitable for the 
present invention include PCA, weighted SVD, MCR-ALS, or other multivariate 
analysis techniques known to those skilled in the art. See, e.g., Kotula et a/., 

25 "Automated Analysis of SEM X-Ray Spectral Images: A Powerful New 
Microanalysis Tool," Microscopy and Microanalysis 9. 1 (2003). 

A preferred spectral image analysis technique is MCR-ALS. In FIG. 6 is 
shown a method for the image analysis using MCR-ALS that can be applied to 
the PCA-factored representation TP T (or the spectrally compressed, r-component 

30 PCA-factored representation TP T ). 
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At step 31, an initial spatial estimate is made for the concentration matrix 
C. This provides an initial feasible estimate for the spectral shapes matrix S at 
step 32. 

At step 35, a least squares solution for C can be computed from Eq. (18), 
5 using the PCA-factored representation of the data TP T (or r-component PCA- 

factored representation TP T ) and the initial feasible estimate for S, subject to 
constraints at step 33. 

At step 36, a least squares solution for S can be computed from Eq. (19), 
using the PCA-factored representation of the data TP T (or r-component PCA- 

10 factored representation TP T ) and the least squares solution for C from step 35, 
subject to constraints at step 33. 

At step 37, a convergence metric is computed and compared to a 
convergence criterion. If the convergence metric is not less than the 
convergence criterion, the procedure is returned to step 34 with the updated least 
15 squares solutions for S and C and steps 35 and 36 are repeated. 

If the convergence metric is less than the convergence criterion at step 37, 
an acceptable level of convergence is achieved and a least squares solution for 
S and C is provided at step 38. 

Alternatively, the roles of C and S can be easily interchanged by providing 
20 an initial feasible estimate for C at step 32 and first solving for S at step 35. 

Implementation of the Block PCA Algorithm 
The block PCA algorithm outlined in the previous section has been 
implemented and applied to several data sets ranging in size from 1024 2 to 1024 3 
(~10 6 to ~10 9 ) individual data elements. A C-language implementation of the 

25 algorithm was constructed using the Microsoft Visual C++ compiler and it makes 
extensive use of the Intel Math Kernel and Performance Primitives libraries for 
performing the linear algebra operations. The Intel library contains versions of 
the BLAS and LAPACK routines that are optimized for Intel processors. All 
calculations were performed using single precision floating point arithmetic. In 

30 designing the implementation, care was taken to select block sizes that made 
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efficient use of the cache and main memory available in the system. All of the 
example data sets represent energy dispersive x-ray spectrum images as 
acquired in a scanning electron microscope. The data are stored on disk in a 
compressed format and were read into the PCA program using a dynamic link 
5 library supplied by the manufacturer. The calculations were all performed on a 
Dell Precision 420 Workstation containing a Pentium III processor running at 1 
GHz with 1 Gbyte of main memory. 

In FIG. 7 is shown the first six principal components obtained from a PCA 
analysis of one of the spectrum images (data set "B" in Table 1). The analysis 

10 was performed on the unweighted data matrix. The sample consisted of series 
of wires having varying composition embedded in an epoxy block, which was 
then cross-sectioned and imaged with EDX. The first five components clearly 
represent the chemical information present in the data set while the sixth and 
subsequent components seemingly represent noise. Using the C-language 

15 implementation of the block PCA algorithm, all 1 024 principal components of this 
data set were computed in 50 sec. By way of comparison, computation of PCA 
using SVD in MATLAB required 599 sec. for the same data. For the significant 
components, the block PCA algorithm and SVD yielded results that are the same 
within numerical precision. 

20 In this data set, as is typical of full spectrum images, the number of 

principal components representing noise greatly outnumbers the chemically 
relevant principal components. This provides an opportunity to spectrally 
compress the data substantially by truncating the score and loading matrices to 
contain only those vectors corresponding to significant singular values (or 

25 eigenvalues). For purposes of compression it is not necessary to know exactly 
how many significant components are needed to represent the chemical 
information, but rather, to have an upper limit to that number. See R. Bro and C. 
Andersson, "Improving the speed of multi-way algorithms: Part II: Compression," 
Chemometrics and Intell. Lab. Svst . 42, 105 (1998). To evaluate the penalty 

30 exacted by computing more than the absolute minimum number of components, 
the time to compute truncated PCA models of varying size for data set "B" was 
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measured. The results are shown in FIG. 8. These results indicate that 
computation time is relatively insensitive to the model size for small numbers of 
components. This is reasonable since the time required to read the full data set 
and to compute the crossproduct matrix is independent of model size and is the 
5 dominant calculation for small models. 

Table 1 summarizes the results of timing the block PCA algorithm while 
computing 100-component PCA models for data sets varying in size by a factor 
of over 1000. The times required to complete each of the major subparts of the 
algorithm are also listed. These include reading the data from disk, forming the 

10 data crossproduct matrix, performing an eigenanalysis of the crossproduct 

matrix, reading the data a second time, if necessary, and projecting the data into 
the subspace defined by the loading vectors to compute the scores. It should be 
noted that the time required to read the data from disk a second time when a 
data set is larger than can be held in memory is the only cost that is unique to the 

15 out-of-main-memory algorithm. In other words, the times required for the other 
four steps will be the same irrespective of whether of not the full data set can be 
contained within main memory. In general, the computation times scale as 
expected for the case that the number of pixels is greater than or equal to the 
number of spectral channels. Reading the data and projecting the data onto the 

20 loading vectors takes time approximately proportional to the data size (m x p). 
Forming the crossproduct matrix, on the other hand, takes time proportional to 
the number of pixels and proportional to the square of the number of channels (m 
x p 2 ). Finally, eigenanalysis requires time proportional to the cube of the number 
of channels (p 3 ). Based on the computational complexities of the subtasks, it 

25 would be expected that for sufficiently large images, formation of the 

crossproduct matrix should become the dominant step in the calculation. This 
was observed, in practice, as shown in Table 1 . It is also noteworthy that for all 
of the images analyzed, the time to compute the crossproduct matrix was 
substantially larger than the time required to read the data from disk. This belies 

30 our intuition that an operation requiring disk access will necessarily be the slow 
step in an algorithm. 
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Data Set 


A 


B 


C 


D 


E 


F 


Number of Pixels 


1024 


16384 


65536 


65536 


262144 


1048576 


Number of Channels 


1024 


1024 


1024 


2048 


1024 


1024 


Data Size (Mbyte) 


4 


64 


256 


512 


1024 


4096 


1 st Read (sec) 


0.23 


3 


9.9 


17.8 


76 


260 


Form Cross-product (sec) 


0.66 


10.7 


42.7 


165 


171 


682 


Eigenanalysis (sec) 


3.4 


3.3 


3.4 


25.7 


3.4 


4 


2 nd Read (sec) 








0.1 


. 6 


64 


Compute Scores (sec) 


0.05 


0.7 


2.9 


5.6 


11 


48 


Total Time (sec) 


4.3 


17.7 


58.9 


214 


267 


1058 



Table 1. Times required to compute 100-component PCA models for energy dispersive x-ray 
images of various sizes. The times are broken down by the major tasks accomplished by the 
block PCA algorithm. 



The remaining discussion of the block PCA algorithm concerns its 
performance when the data sets are larger than available main memory. As 
noted above, the only performance penalty incurred by such large data sets 

5 arises from the need to read the data from disk a second time. Based on 

computational complexity arguments similar to those presented above, it would 
be expected, once again, that as data sets become larger, reading the data 
becomes relatively less important to the overall computation time. Experimental 
verification of this expectation is a more difficult here, however, owing to the lack 

10 of sufficiently large data sets and the fact that the time required to read given 
data from disk a second time depends on operating system caching strategies, 
etc. In the worst case, the time it takes to read the data twice should be no 
longer than double the time it takes to read the data once. Allowing this, we find 
that for the two data sets that do require an out-of-main-memory algorithm, data 

15 set "E" which is approximately the size of main memory, and data set "F", which 
is about 4 times larger, the expected trend is observed. Finally, for these two 
large data sets, the time penalty incurred by the block PCA algorithm amounts to 
only a few percent of the total analysis time, thus establishing the feasibility of 
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applying such techniques to the solution of very large problems of real chemical 
interest. 

Spatial Compression using Wavelets 
The spectral compression algorithms of the present invention can be 
5 combined with spatial compression algorithms, that are the subject of the related 
U.S. Patent Application SD7213, "Spatial Compression Algorithm for the Analysis 
of Very Large Multivariate Images", to provide additional computational 
efficiencies in the analysis of large, high resolution full-spectrum images. 

The goal of spatial compression is to map an image into a compressed 
10 image containing a smaller number of pixels that retain the original image's 
information content. Wavelet transforms can provide a multiresolution 
representation of the image, by hierarchically decomposing it in terms of a coarse 
overall image and successively finer levels of detail. The efficacy of a 
multiresolution technique can be judged by the degree of compression achieved 
15 during the transformation and the preservation of the information contained in the 
image following reconstruction or decompression. High compression levels can 
generally be achieved due to the large amount of redundancy in a typical 
spectrum image. Additionally, spatial compression is a filtering operation that 
improves the signal-to-noise ratio of the data. Therefore, working with spatially 
20 compressed data can provide a better result while utilizing fewer computer 
resources. 

Such multiresolution techniques are common in conventional image 
processing. For example, the JPEG2000 (Joint Photographers Expert Group) 
standard is a transform-based compression method used in digital photography 

25 applications. It uses smoothing wavelet filters and coefficient thresholding to 
separate spatial information according to frequency and orientation. The 
JPEG2000 wavelet basis is chosen to maximize compression while minimizing 
distortion as perceived by the human visual system. 

However, the JPEG compression technique may be unacceptable for 

30 scientific images intended for measurement and analysis. Other sets of basis 
functions may provide a more efficient and accurate representation of a scientific 
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image. In particular, a basis should be selected that is aligned with the 
underlying physical model. Most spectroscopic analyses follow Beer's Law, a 
linear additive model. That is, at each spatial location, the measured spectrum is 
a linear combination of pure-component spectra. 
5 The Haar wavelet transform is the simplest wavelet basis, comprising a 

square step. The Haar wavelets provide an excellent basis for the 
multiresolution processing of spectral images for compositional analysis. 
Application of the Haar transform to an image results in an overall average 
approximation of the original image and detail coefficients in order of increasing 

10 resolution. Using the Haar basis, the approximation coefficients are obtained 
simply by co-adding spectra from adjacent pixels. These approximation pixels 
will also follow the linear additive model and, as long as there is sufficient 
variation in the compressed data to span the spectral space of the original data, 
there will be no loss of spectral information. Furthermore, since the Haar basis is 

15 orthonormal, the least squares problem can be solved using the wavelet 
coefficients themselves, without having to reconstruct the data from those 
coefficients. Although the Haar wavelets provide an excellent basis for chemical 
analysis, other general wavelet transformations can also be used in the spatial 
compression algorithm described herein. 

20 In FIG. 9A is shown the EDX image of the cross-sectioned wires 

embedded in the epoxy block. In FIG. 9B is shown a one-level Haar wavelet 
transformation of the original image into four subimages by applying low pass (L) 
and high pass (H) filters to each of the two spatial dimensions. Therefore, the 
upper left quadrant (LL) of the transformed image provides an overall average, or 

25 approximation, of the original image; the lower left quadrant (LH) provides the 
vertical details; the upper right quadrant (HL) provides the horizontal details; and 
the lower right quadrant (HH) provides the diagonal details. By thresholding or 
throwing away (i.e., decimating) the detail coefficients and recursively 
decomposing the approximation subimages by repeating the filtering process on 

30 the low-low pass quadrant of the transformed image, a multiresolution pyramid 
comprising multiple levels of the approximation subimages can be constructed. 
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Accordingly, very high levels of compression of the original image can be 
achieved. 

As indicated previously, typically the goal of spectrum image analysis is to 
factor a multidimensional data set D into the product of the concentration array C 
5 and the transpose of the spectral shapes matrix S T , according to: 

D = CS T (50) 

For a 3-way data cube, D has the dimensions of m x n x p, C has the dimensions 
mxnxq and S has dimensions p x q, where m x n is the number of pixels in the 
two spatial dimensions, p is the number of channels, and q is the number of pure 
10 components. 

The first step in a typical multivariate image analysis algorithm is to unfold 
the data cube D and the concentration array C into the 2D data matrix D and the 
2D concentration matrix C according to: 

D = [vec(D. J vecfej - ■ • vec(p.J (51 ) 

15 and 

C = [vec(C.Jvec(cJ-.vec(c„JJ (52) 

then 

min||D-CS T || =minD-CS T || (53) 



— Hp 



F 



which can be solved by normal linear algebra. 
20 The Haar transform can be implemented as an orthogonal matrix as: 

' V2Li, /2 ®[i,-i] 



(54) 



where I is an identity matrix, <8> is the Kronecker product, and £ is the length of 
the vector to be transformed. The Haar matrices are applied independently to 
the rows and the columns of the data matrices. As shown conceptually in FIG. 
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10, the Haar transform can be applied channel-wise to an m-row x n-column x p- 
channel data set D according to: 

W m xD.. fc xW n T =D.. k (55) 

or 

5 (W n <8>Wjxvec(D.J = vec(jLj (56) 

for the /c" 1 image plane. W m is the wavelet transform across the rows and Wj is 
the wavelet transform down the columns, and each image plane is treated 
independently. Therefore, the 2D transformed data matrix D can be computed 
as: 

10 (W„®W m )xD = D (57) 

The Haar transform combines and permutes the elements of the data matrix into 
wavelet coefficients of an approximation matrix D a and detail matrices which can 

be adjoined to form a single detail matrix D d . That is, the wavelet transformation 

partitions the matrix D , according to: 



15 D = 



a 

A, 



(58) 



Especially if there are not too many sharp edges in the spectral image, 
most of the information (i.e., energy) will be contained in the approximation 

coefficients D a (i.e., the upper left quadrant in FIG. 9B). The Haar approximation 

coefficients D a simply represent spectral mixing on a broader scale. If the linear 

20 additive model holds and there remains sufficient variation in these "super" pixels 
to span the spectral space, there will be no loss of spectral information by setting 

D d to zero for purposes of spectral image analysis. Of course, prior to setting 

them to zero, the details can be stored using standard data compression 
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techniques to enable inverse wavelet transformation to recover the concentration 
maps at the original spatial resolution. 

Because the wavelet coefficients for a particular pixel depend solely on 
the pixels in a small neighborhood surrounding that pixel, only that neighborhood 
of pixels needs to be in memory at any given time. In the case of the Haar 
transform, the mapping of the pixel locations in the original image to pixel 
locations in the compressed image is known a priori for any arbitrary level of 
compression. Consequently, as each individual spectrum is read, it can be 
accumulated immediately into the appropriate compressed pixel and then 
discarded. Therefore, the Haar wavelet transform can be computed on-the-fly, 
such that the entire data set or image never needs to reside simultaneously in 
memory. This allows even larger problems to be solved on a given computer 
and enables faster computations. 

Because the wavelet transform is linear and the Haar basis is 
orthonormal, a least squares analysis can be performed on the wavelet 
coefficients rather than the image data. By recognizing that (W n ® W m ) is an 
orthogonal matrix, the MCR-ALS analysis can be accomplished in terms of the 
transformed data, according to: 



min||D-CS T || F = min||(W n ® 



(59) 



20 subject to transformed constraints. For example, assuming a constraint of non- 
negativity of concentrations (i.e., C > 0), the transformed non-negativity 
constraint becomes: 



(W n ®Wj T C>0 



(60) 



where C is a transformed concentration matrix. 
25 If the detail coefficients are set to zero, the least squares problem in Eq. 

(59) becomes 



min 


D-CS T 


= min 








S T 






F 


0 




A. 





= minp a -C a S T 



(61) 
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where C a and C d represent the contributions to the overall concentration that 

arise from the wavelet approximation and detail coefficients, respectively. The 
Haar approximation coefficients are necessarily non-negative also, because they 
are simply summations of concentration elements that are themselves 
5 constrained to be non-negative. Therefore, using the Haar basis and decimating 
all of the detail coefficients reduces the transformed constraint in Eq. (60) to 

simple non-negativity (i.e., C a > 0 ). 

Therefore, an equivalent solution to the MCR-ALS problem can be 
obtained working with either the original data or the wavelet coefficients. The 

10 advantage of working in the wavelet domain is that most of the spatial 

information of interest is contained in a few significant wavelet coefficients, much 
as most of the spectral information is contained in a few principal components in 
the PCA factoral representation. By thresholding the coefficients, the interesting 
spatial information can be concentrated in a reduced number of significant 

15 wavelet coefficients that is small compared to the number of original pixels. In 
particular, the MCR-ALS analysis can be performed on a compressed data 
matrix consisting of only the approximation coefficients. For example, using only 
the approximation coefficients at level two in both spatial dimensions and 
decimating the details enables the computation to be performed using only 1/16 

20 the number of coefficients as original pixels. 

Finally, if the individual pixels follow Beer's law, the Haar approximation 
coefficients do also. Therefore, the approximation coefficients are given, within a 
normalization factor, as a sum of adjacent pixels. That is, a level one 
decomposition in both rows and columns results in a summing over a 2 x 2 pixel 

25 neighborhood. From a physical viewpoint, the unmixing problem, or MCR, 
assumes that the signal arising from adjacent pixels is a linear combination of 
pure component spectra weighted by the amount of each component in the 
mixed pixel. As long as the linear model holds and a sufficient number of linearly 
independent mixtures are retained in the approximation, analysis of the 

30 approximation coefficients alone will provide a satisfactory curve resolution or 
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spectral unmixing. A preferred level of compression maximizes the compression 
while still retaining sufficient variation in the compressed pixels to span the 
spectral space of the original data. Therefore, good results can generally be 
obtained as long as the number of compressed pixels remains substantially 
5 larger than the number of spectral pure components. 

The spatial compression algorithm can also be applied to any arbitrary 
factored representation of the data, according to Eq. (12), where the data factor 
matrix A comprises the spatial information and the data factor matrix B 
comprises the spectral information. The data factor matrix A can be transformed, 

10 using a wavelet transform, to provide a transformed data factor matrix A . The 

wavelet coefficients of the transformed data factor matrix A can be spatially 
compressed by thresholding the wavelet coefficients. An image analysis can be 

performed using the transformed data factor matrix A and the data factor matrix 

B to provide a transformed concentration matrix C and a spectral shapes matrix 
15 S. A concentration matrix C can then be computed from the transformed 

concentration matrix C . Furthermore, the data factor matrices A and B can be 
blocked and the concentration matrix C can be accumulated blockwise. 

Preferably, the spatial compression algorithm using wavelets can be 
combined with the spectral compression algorithm to provide even greater 
20 computational efficiencies. In summary, the spectrally compressed data can be 
spatially compressed, according to: 

D=(W n ®W m )xD = (W n ®W m )xTP T =TP T (62) 
Image analysis, preferably using MCR-ALS, can then be applied to the spectrally 
and spatially compressed data representation TP T , according to: 



25 min 



D-CS t || f = min||(W„ ® W m )(f P T -CS T )[ = min||f a P T -C a S T 



(63) 



subject to appropriate constraints. Again, the detail coefficients have been set to 
zero. 
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25 



Image analysis, according to either Eq. (61) or Eq. (63), provides an 
estimated spectral shapes matrix S, comprising a set of pure component spectra, 

and a spatially compressed concentration matrix C a . The goal of decompression 

or reconstruction is to estimate a components map at the full spatial resolution. 
Such a components map can be obtained from the concentration matrix C. 
The concentration matrix C can be computed from the compressed 

concentration matrix C a by inverse wavelet transformation. In this context, C a 

represents the approximation coefficients as transformed into the spectral 
shapes basis defined by S. Inverse wavelet transformation involves rereading 
the detail coefficients back into memory and transforming them to the spectral 
basis defined by S, according to: 



C d =D d S 



(64) 



Then, the component maps at full spatial resolution are obtained by performing 
the inverse wavelet transform: 



c = (w„®w m )- 1 x 



(65) 



Alternatively, because the spectral shapes matrix S in Eq. (53) is 
equivalent to the spectral shapes matrix S estimated from the compressed data 
in Eq. (61 ) or (63), C can also be computed directly from the original data D (or 
the PCA-factored representation of the data TP T ). Therefore, a preferred 
approach is to project the original data set D (or the product of T and P) onto the 
estimated spectral shapes matrix S by solving the least squares problem 



min||D-CS T || = mini 
c II Hf c I 



TP t -CS t || f 



(66) 



subject to appropriate constraints. This approach is especially useful if the 
wavelet detail coefficients are discarded. 

Furthermore, the solution to Eq. (66) can be obtained by reading the data 
blocks D, (or the scores blocks T,) sequentially and solving the least squares 
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problem blockwise. The concentration matrix C can then be accumulated as 
concentration blocks C,, according to: 

C = [C, C 2 C,] (67) 

Therefore, the full spatial and spectral resolution can be achieved in an 
5 acceptable MCR-ALS analysis, at the cost of reading the data twice. 

FIG. 11 illustrates conceptually a preferred method for the spatial 
compression of raw or spectrally compressed data using wavelets. The spatial 
compression will be described as applied to the data matrix D, although those 
skilled in the art will realize that spatial compression can be applied likewise to 

10 the PCA-factored representation of the data TP T or f P T . The mapping of a pixel 
at the original spatial resolution to the corresponding pixel at the compressed 
resolution can be calculated using a computer. Therefore, the is no need to 
actually do the folding and unfolding in the computer implementation, as depicted 
conceptually in steps 42 - 46. Furthermore, this means that the wavelet 

15 transformation can be done blockwise to the data matrix D (or the PCA-factored 
representation of the data). Those skilled in the art will realize that other 
manipulations, such as weighting, can also be used with the spatial compression 
algorithm. 

At step 41 , the full 2D data matrix D (or a scores matrix T or f ) is 
20 provided in disk storage. The matrix D can comprise block matrices. 

At step 42 is shown conceptually a folded 3-way data cube D , comprising 
an m-row x n-column x p-channel data set, that is represented by the data matrix 
D in step 41 , according to Eq. (51 ). 

The wavelet transform can be applied channel-wise to the folded data 
25 cube D , according to Eq. (55), and as illustrated conceptually in FIG. 10. As 
implemented on a computer, the wavelet transformation would actually be 
applied to the unfolded data matrix D, according to Eq. (57). 

At step 43 is shown an exemplary two-level wavelet transform of the data 
cube D , comprising the second-level approximation in the upper left hand 
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quadrant of the second-level transformed subimages (i.e., LL 2 ), the second-level 
details (i.e., HL 2 , LH 2 , and HH 2 ), and the first-level details (i.e., HLi, LHi, and 
HHi). Those skilled in the art will realize that arbitrary levels of compression can 
be applied to the original image. 

At step 44, the exemplary second-level transformed image can be 
decimated by throwing away the first- and second-level details to leave only the 
second-level approximation coefficients. Additionally, at step 45, the detail 
coefficients can be quantized, thresholded, and encoded and the retained detail 
coefficients can be stored. 

At step 46, the approximation coefficients can be unfolded to provide the 

spatially compressed 2D data matrix D a (or T a ). 

The spatially compressed data matrix D a can be analyzed by standard 
image analysis methods. Preferably, MCR-ALS is performed on the spatially 
compressed data set D a , or the spatially compressed PCA-factored 

representation of the data f a P T , to provide a spectral shapes matrix S and a 

compressed concentration matrix C a , according to Eq. (61) or (63). 

In FIG. 12 is shown conceptually two methods for computing the 
concentration matrix C from the spatially compressed data and reconstructing the 
component maps at full spatial resolution. 

In method 50, the compressed concentration matrix C a is decompressed 

using an inverse wavelet transform to obtain the concentration matrix C. Since 
the mapping of pixels can be done on a computer, there is no need to actually do 
the folding and unfolding, as depicted conceptually in steps 52 - 56, in the 
computer implementation. 

At step 51, the compressed concentration matrix C a is provided. 

At step 52 is shown conceptually the folded second-level approximation 

subimages (i.e., LL 2 ) of the compressed concentration matrix C a . 
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At step 53, the stored detail coefficients are decoded, unfolded, projected 
onto S, and refolded. 

At step 54, the refolded concentration details (/.e. f HL 2 , LH 2 , HH 2 , HLi, 
LH^ and HHi) are added to the second-level approximation subimages to 
5 provide the second-level images of the compressed concentration matrix. 

At step 55, the inverse wavelet transformation is applied to the second- 
level compressed concentration matrix to provide the folded concentration array 
C. 

At step 56, the folded concentration array C is unfolded to provide the 
10 concentration matrix C, according to Eq. (52). A component map can be 
reconstructed from the concentration matrix C. 

In method 60, the concentration matrix C is computed by projection of the 
data matrix D (or the PCA-factored representation of the data matrix TP T ) onto 
the estimated spectral shapes matrix S. 
15 At step 61 , if the data has been spectrally compressed and the analyst is 

working with a PCA-factored representation of the data TP T , the analyst 
proceeds to step 62. If the analyst is working with the original data set D, the 
analyst proceeds to step 68. 

At step 62, a scores block T, of a total of j scores blocks is read into 
20 memory. 

At step 63, a least squares solution is obtained for a concentration block 
C; by projecting the product of the scores block T, and P onto the estimated 
spectral shapes matrix S, according to Eq. (66) and subject to constraints at step 
64. 

25 At step 65, the concentration blocks C, are accumulated. 

At step 67, the steps 61 - 65 are repeated until all of the j scores blocks 
have been read into memory and all j concentration blocks have been 
accumulated. 

At step 56, the concentration matrix C is provided from the accumulation 
30 of concentration blocks, according to Eq. (67). A component map can be 
reconstructed from the concentration matrix C. 
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Alternatively, at step 61 , if the analyst is working with the original data set 
D, the analyst proceeds to step 68. 

At step 68, a data block D, of a total of j data blocks is read into memory. 

At step 69, a least squares solution is obtained for a concentration block 
5 C/ by projecting the data block D, onto the estimated spectral shapes matrix S, 
according to Eq. (66) and subject to constraints at step 64. 

At step 65, the concentration blocks C, are accumulated. 

At step 67, the steps 61 - 65 are repeated until all j data blocks have been 
read into memory and all j concentration blocks have been accumulated. 
10 At step 56, the concentration matrix C is provided from the accumulation 

of concentration blocks, according to Eq. (67). A component map can be 
reconstructed from the concentration matrix C. 

Implementation of the Spatial Compression Algorithm 
FIGs. 13A-13D show an example of the MCR-ALS analysis of a spatially 

15 compressed energy dispersive X-ray (EDX) image of a complex geological 
material. FIG. 13A shows a scanning electron micrograph (SEM) of the 
geological sample, which is a primary tellurium ore in the form of PbTe. Expert 
geological examination indicated that this material contained about thirteen 
distinct chemical phases. EDX spectrum images were acquired for a 1 mm field 

20 of view of the sample at a 512 x 512 pixel resolution (i.e., a resolution of about 2 
^m per pixel). A 1024-channel spectrum was collected at each pixel, with about 
100 counts per spectrum on average. This resulted in an approximately 1 Gbyte 
data set in single precision floating point numbers. 

The EDX spectrum images were spatially compressed to the level four 

25 approximation in both spatial dimensions and analyzed at a 32 x 32 resolution. 
The MCR-ALS analysis of the spatially compressed data was completed in about 
3 minutes on a 1 GHZ computer having a single Gbyte of main memory. In FIG. 
13B is shown the results of the MCR-ALS analysis of the compressed data. The 
analysis gave excellent, physically meaningful results consistent with the expert 

30 geological examination. Using both spatial and spectral compression, the 
complex geological sample was analyzed in less than two minutes. 
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The MCR-ALS analysis with spatial compression retains the fine spatial 
details of the original image. In FIGs. 13C and 13D are shown the analysis of the 
silicate component in the geological sample. FIG. 13C shows the 32 x 32 pixel 
compressed data image of the silicate component that was analyzed by MCR- 
5 ALS. FIG. 13D shows the reconstructed silicate image after projection of the 
original data onto the estimated spectral shapes matrix derived from the MCR- 
ALS analysis of the compressed data matrix. The reconstructed image of the 
silicate component map retains the full spatial resolution of the original data. 
An additional benefit that can be obtained using wavelets is improved 

10 sensitivity to the pure spectral components. By co-adding pixels to provide the 
approximation coefficients, the signal-to-noise ratio can be improved. For a 
homogeneous region, the improvement will go as the square root of the number 
of pixels added. Therefore, high-resolution images collected at a low signal-to- 
noise (S/N) ratio can achieve results equivalent to low spatial resolution, high S/N 

15 images. 

FIG. 14 shows scaled eigenvalues for the sorted principal components of 
the uncompressed image of the geological sample and four levels of Haar 
wavelet decomposition in both spatial dimensions. In such a plot, eigenvalues 
representing noise components should fall on a nearly horizontal line. 

20 Components representing systematic chemical variation in addition to noise 

cause a positive deviation from the line. Thus, the number of chemically relevant 
factors can be estimated by finding the break in the eigenvalue plot. Ten 
principal components representing chemical information could be detected by 
MCR-ALS analysis of the uncompressed 512 x 512 data set. Additional pure 

25 components could be detected by analysis of the spatially compressed images. 
For example, 15 pure components were detected at a level two and higher 
compression (i.e., a 32 x 32, 64 x 64, or 128 x 128 approximation image). In 
these cases, the improved S/N resulting from spatial compression has enabled 
the signal arising from minor chemical components to become significant with 

30 respect to noise. 
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Furthermore, because the discrete wavelet transform is separable, the 
rows and columns can be treated separately. Therefore, the spatial filter can be 
matched to the spatial characteristics of the sample. FIGs. 15A and 15B show 
partial results from MCR-ALS analysis of an EDX spectral image of an 
5 alumina/braze interface. FIG. 15A shows two components, the alumina phase 
and an interspersed glass particulate phase, when analyzed at the original, 
uncompressed 128 x 128 resolution. FIG. 15B shows the two corresponding 
components obtained after spatially compressing the spectral image to level 
seven along the rows, but remaining uncompressed down the columns (i.e., a 

10 128 x 1 approximation image). Such an asymmetric filter improves sensitivity to 
features that look like horizontal lines. In the case of the braze interface, 
applying this asymmetric filter has enabled the detection of a real, differential 
self-absorption process occurring at the interface. At the same time, spectral 
contrast arising from the glass particulates has been reduced since the small 

15 globular shapes of the particulates are spatially mismatched to a horizontal line. 

It will be understood that the above description is merely illustrative of the 
applications of the principles of the present invention, the scope of which is to be 
determined by the claims viewed in light of the specification. The invention has 
been described as spectral compression algorithms for the analysis of very large 

20 multivariate images. Other variants and modifications of the invention will be 
apparent to those of skill in the art. 
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